Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trm 30823 UniParc proteome redundant fasta #249

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

LeonardoGonzales
Copy link
Contributor

No description provided.

# Conflicts:
#	controlled-vocabulary/src/test/resources/xdb/dr_ord
#	core-parser/src/main/java/org/uniprot/core/parser/fasta/uniparc/UniParcFastaParser.java
#	core-parser/src/test/java/org/uniprot/core/parser/fasta/UniParcFastaParserTest.java
# Conflicts:
#	xml-parser/src/main/java/org/uniprot/core/xml/uniparc/UniParcDBCrossReferenceConverter.java
@ahmadshadab
Copy link
Contributor

I am going to merge with main before review.

VECTORBASE(3500, "VectorBase", false, false),
VEGA(3600, "VEGA", true, false, "https://vega.sanger.ac.uk/id/%id"),
WORMBASE_PARASITE(3700, "WBParaSite", true, true, "https://parasite.wormbase.org/id/%id"),
WORMBASE(3800, "WormBase", true, true,"https://wormbase.org/db/seq/protein?name=%id;class=CDS");

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we know if a database is source? Could you please add source of this information in this class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jie provided the list of databases that are sources in UniParc.


private static final Set<UniParcDatabase> uniProtDatabases = Set.of(
UniParcDatabase.SWISSPROT, UniParcDatabase.TREMBL, UniParcDatabase.SWISSPROT_VARSPLIC);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use uppercase for const.

@@ -11,6 +11,8 @@
*/
public interface UniParcCrossReference extends CrossReference<UniParcDatabase> {

Copy link
Contributor

@ahmadshadab ahmadshadab Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add formula and an example how we construct source. :: and eg. ABC01415:UP000005640:Chromosome 1

hasActive = true;
}
} else {
sourceXrefs.add(xref);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are assuming that database type other than uniProtDatabases are source xref. But in UniParcDatabase,java i can see other non-source xrefs e.g. EPO, FLYbase. Will they not be added as sourceXrefs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are doing a filter in the REST side (we filter proteome Id). I can add an else if for sources if you prefer

@@ -0,0 +1,134 @@
package org.uniprot.core.parser.fasta.uniparc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add few comments in this class.

}

private static StringBuilder getFastaHeader(List<UniParcCrossReference> xrefs, boolean hasActive, String id, String proteomeId, boolean isSource) {
Set<String> proteinName = new LinkedHashSet<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use plural name for collection proteinNames and components

if(!component.isEmpty()){
sb.append(":").append(String.join("|", component));
}
return sb;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move "|" to a constant.

if(!sourceIds.isEmpty()){
sb.append(" SS=").append(String.join("|", sourceIds));
}
sb.append(" PC=").append(proteomeId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proteomeId can be null

component.add(ids[2]);
}
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract the logic for processing xref.getProperties() into a helper method, simplifying the loop body.
We should keep the lambda small. we can extract the for(source:sources) logic another method.

@@ -47,6 +49,9 @@ public static void populateUniParcCrossReferenceBuilder(String propertyType, Str
case PROPERTY_UNIPROTKB_ACCESSION:
builder.propertiesAdd(PROPERTY_UNIPROTKB_ACCESSION, propertyValue);
break;
case PROPERTY_SOURCES:
builder.propertiesAdd(PROPERTY_SOURCES, propertyValue);
break;
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not covered this case in test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants